Skip to content

Automated Test: dual-storage-enhanced #325

Closed

Conversation

admin-coderabbit
Copy link
Owner

@admin-coderabbit admin-coderabbit commented Feb 4, 2026

This pull request was automatically created by @coderabbitai/e2e-reviewer.

Batch created pull request.

Summary by CodeRabbit

  • New Features

    • Enhanced Mode 3 dual-write with asynchronous legacy operations and 10-second timeouts for better performance.
    • Added comprehensive logging and per-operation metrics recording across all dual-write paths.
  • Tests

    • Expanded test coverage for Mode 3 across multiple storage backends (file, unified, etcd).

* Dual writer: mode 3

* Add integration tests for playlits in mode 3

* Remove todo

* Update pkg/apiserver/rest/dualwriter_mode3.go

Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>

* Admin: Fixes an issue where user accounts could not be enabled (#88117)

Fix: unable to enable user

* [REVIEW] FInish mode 3 and add tests

* Improve logging

* Update dependencies

* Update pkg/apiserver/rest/dualwriter_mode3_test.go

Co-authored-by: maicon <maiconscosta@gmail.com>

* remove test assertion

* Use mode log when dual writer is initiated

---------

Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: gonvee <gonvee@qq.com>
Co-authored-by: maicon <maiconscosta@gmail.com>
@coderabbit-eval
Copy link

coderabbit-eval bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

The changes introduce a Mode3 dual-writer implementation with asynchronous legacy storage operations executed via background goroutines with 10-second timeouts. A new generic NewDualWriter constructor replaces mode-specific constructors. Operations return Storage results immediately while scheduling legacy writes asynchronously, with duration recording and enhanced logging across all paths.

Changes

Cohort / File(s) Summary
Mode3 Implementation
pkg/apiserver/rest/dualwriter_mode3.go
Refactored Mode3 with asynchronous legacy operations, added imports for errors and time, introduced mode3Str logging constant, enhanced all CRUD methods (Create, Get, List, Delete, Update, DeleteCollection) to record operation duration and spawn background goroutines for legacy writes with 10s timeouts.
Mode3 Test Suite Overhaul
pkg/apiserver/rest/dualwriter_mode3_test.go
Replaced commented test suite with comprehensive tests using real dual-writer via new NewDualWriter constructor; added exported function NewDualWriter(mode, ls, us, p) and removed NewDualWriterMode3; tests exercise all CRUD paths with mock-backed storages and Prometheus metrics.
Test Infrastructure Updates
pkg/apiserver/rest/dualwriter_mode1_test.go, pkg/tests/apis/playlist/playlist_test.go
Removed local Prometheus registry initialization in Mode1 test; added integration test cases for Mode3 across file, unified, and etcd storage backends with KubernetesPlaylists and UnifiedStorage feature flags.

Sequence Diagram

sequenceDiagram
    participant Client
    participant DualWriter
    participant Storage
    participant LegacyStorage
    
    Client->>DualWriter: Create(obj)
    DualWriter->>Storage: Create(obj)
    activate Storage
    Storage-->>DualWriter: (result, err)
    deactivate Storage
    
    alt Storage success
        DualWriter-->>Client: (result, nil)
        Note over DualWriter: Spawn async goroutine
        DualWriter->>LegacyStorage: Create(obj) [10s timeout]
        activate LegacyStorage
        LegacyStorage-->>DualWriter: (result, err)
        deactivate LegacyStorage
        Note over DualWriter: Record metrics, log outcome
    else Storage error
        DualWriter-->>Client: (nil, err)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Mode3 awakens with async grace,
Legacy writes sprint the background race!
Storage first, then heritage calls—
Ten seconds to echo through data halls.
Metrics dance, logs sing bright,
Dual paths harmonize in flight! 🎵

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Automated Test: dual-storage-enhanced' is vague and does not clearly communicate the main changes; it uses a generic prefix and relies on branch naming rather than describing what was actually changed. Use a more descriptive title that summarizes the primary changes, such as 'Add Mode3 dual-writer tests and implement async legacy storage operations' or similar that reflects the actual code modifications.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dual-storage-enhanced

Comment @coderabbit eval help to get the list of available commands and usage tips.

@coderabbit-eval coderabbit-eval bot closed this Feb 4, 2026
Copy link

@coderabbit-eval coderabbit-eval bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@pkg/apiserver/rest/dualwriter_mode3.go`:
- Around line 161-167: The goroutine handling the legacy DeleteCollection
mistakenly calls d.recordStorageDuration instead of d.recordLegacyDuration;
update the call in the anonymous goroutine that invokes
d.Legacy.DeleteCollection(...) to call d.recordLegacyDuration(err != nil,
mode3Str, options.Kind, method, startLegacy) so the legacy path's timing is
recorded consistently with Create/Delete/Update.
- Around line 50-57: The background goroutine for legacy operations is using the
incoming request context (context.WithTimeoutCause(ctx,...)) so it gets
cancelled when the HTTP handler returns; change the parent to a detached context
(e.g., context.Background() or a background context that copies only needed
values like trace IDs) when creating the timeout context for d.Legacy.Create,
and apply the same pattern for the other background goroutines that call
d.Legacy.Delete, d.Legacy.Update, and d.Legacy.DeleteCollection; ensure you
still set the 10s timeout, call cancel() via defer, and keep the call to
d.recordLegacyDuration unchanged.
- Around line 43-48: In dualwriter_mode3.go change the error-path metric
recorder to the storage recorder: when Storage.Create fails (the block that
currently calls d.recordLegacyDuration(true, mode3Str, options.Kind, method,
startStorage) after log.Error) replace that call with
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage); do
the same fix in the Update path where the storage error currently records legacy
duration (replace the d.recordLegacyDuration call with d.recordStorageDuration
using the same parameters).
- Around line 94-106: The Delete method uses klog.NewContext(ctx, d.Log) instead
of the enriched local logger and passes name where kind should be recorded;
change the context call to klog.NewContext(ctx, log) so the enriched logger
(log) is used, and update the metric call d.recordStorageDuration(false,
mode3Str, /*kind*/ options.Kind, method, startStorage) to pass options.Kind
instead of name; these fixes are in the Delete function and involve the symbols
klog.NewContext, d.Log, log, and d.recordStorageDuration.
- Around line 125-132: The storage update error path in Update currently calls
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage);
replace that call with d.recordStorageDuration(true, mode3Str, options.Kind,
method, startStorage) so the storage duration metric is recorded on storage
errors (match the success call to recordStorageDuration and other methods like
Delete/DeleteCollection).
🧹 Nitpick comments (3)
pkg/apiserver/rest/dualwriter_mode3_test.go (3)

47-77: Async legacy operations in goroutines are never verified by tests.

The happy-path Create test sets up a legacy mock expectation (Line 31), but the goroutine spawned in DualWriterMode3.Create runs asynchronously. Since the test doesn't synchronize with the goroutine (e.g., via time.Sleep, a channel, or mock.AssertExpectations), the test can complete before the legacy call fires. This means the legacy dual-write path is effectively untested across all Mode3 tests.

Consider adding m.AssertExpectations(t) with a small sleep or a synchronization mechanism to verify the goroutine completes its legacy call.


49-54: Shared *mock.Mock instance for both legacy and storage makes it impossible to verify backend-specific calls.

All test functions create a single m := &mock.Mock{} that backs both legacyStoreMock and storageMock. When both backends expose the same method name (e.g., Create, Update), the mock cannot distinguish which backend received the call. This means tests can't verify that, for example, Storage.Create was called but Legacy.Create was not (in the error path), or vice versa.

Consider using separate *mock.Mock instances for ls and us to enable independent call verification.

Also applies to: 108-113, 216-222, 270-276, 328-334


119-120: Local p shadows the package-level Prometheus registry.

Line 119 declares p := prometheus.NewRegistry(), shadowing the package-level p used in all other test functions. This is inconsistent but doesn't cause functional issues currently.

Comment on lines 43 to +48
if err != nil {
log.Error(err, "unable to create object in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err
}
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrong metric recorder on storage error — records legacy duration instead of storage duration.

When Storage.Create fails at Line 43, Line 45 calls d.recordLegacyDuration(...). This should be d.recordStorageDuration(...) since the measured operation is the storage call, not the legacy call. The same bug appears in Update at Line 129.

🐛 Proposed fix
 	if err != nil {
 		log.Error(err, "unable to create object in storage")
-		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+		d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
 		return created, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil {
log.Error(err, "unable to create object in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err
}
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
if err != nil {
log.Error(err, "unable to create object in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err
}
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 43 - 48, In
dualwriter_mode3.go change the error-path metric recorder to the storage
recorder: when Storage.Create fails (the block that currently calls
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) after
log.Error) replace that call with d.recordStorageDuration(true, mode3Str,
options.Kind, method, startStorage); do the same fix in the Update path where
the storage error currently records legacy duration (replace the
d.recordLegacyDuration call with d.recordStorageDuration using the same
parameters).

Comment on lines +50 to +57
go func() {
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
defer cancel()

startLegacy := time.Now()
_, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)
}()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Background goroutines inherit the request-scoped context, which will be cancelled when the HTTP handler returns.

The goroutine derives its timeout from ctx, which is the request context. Once the HTTP response is sent (Line 59), the request context is cancelled, and the legacy Create call will be immediately aborted rather than getting the intended 10-second window. The same pattern affects Delete (Line 110), Update (Line 135), and DeleteCollection (Line 163).

Use context.Background() (or a detached copy preserving only values like trace IDs) as the parent for the timeout context in all goroutines.

🐛 Proposed fix for Create (apply the same pattern to Delete, Update, DeleteCollection)
 	go func() {
-		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
+		ctx, cancel := context.WithTimeoutCause(context.Background(), time.Second*10, errors.New("legacy create timeout"))
 		defer cancel()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go func() {
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy create timeout"))
defer cancel()
startLegacy := time.Now()
_, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)
}()
go func() {
ctx, cancel := context.WithTimeoutCause(context.Background(), time.Second*10, errors.New("legacy create timeout"))
defer cancel()
startLegacy := time.Now()
_, errObjectSt := d.Legacy.Create(ctx, obj, createValidation, options)
d.recordLegacyDuration(errObjectSt != nil, mode3Str, options.Kind, method, startLegacy)
}()
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 50 - 57, The background
goroutine for legacy operations is using the incoming request context
(context.WithTimeoutCause(ctx,...)) so it gets cancelled when the HTTP handler
returns; change the parent to a detached context (e.g., context.Background() or
a background context that copies only needed values like trace IDs) when
creating the timeout context for d.Legacy.Create, and apply the same pattern for
the other background goroutines that call d.Legacy.Delete, d.Legacy.Update, and
d.Legacy.DeleteCollection; ensure you still set the 10s timeout, call cancel()
via defer, and keep the call to d.recordLegacyDuration unchanged.

Comment on lines +94 to +106
func (d *DualWriterMode3) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (runtime.Object, bool, error) {
var method = "delete"
log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method)
ctx = klog.NewContext(ctx, d.Log)

startStorage := time.Now()
res, async, err := d.Storage.Delete(ctx, name, deleteValidation, options)
if err != nil {
log.Error(err, "unable to delete object in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}
d.recordStorageDuration(false, mode3Str, name, method, startStorage)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Two issues in Delete: inconsistent log context and wrong kind argument in metric recording.

  1. Line 97: klog.NewContext(ctx, d.Log) uses the struct-level d.Log instead of the enriched local log (which includes name/kind/method). Every other method passes log.
  2. Line 106: d.recordStorageDuration(false, mode3Str, name, method, startStorage) passes the object name as the kind argument. Line 103 correctly passes options.Kind. This will produce misleading metric labels.
🐛 Proposed fix
 	log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method)
-	ctx = klog.NewContext(ctx, d.Log)
+	ctx = klog.NewContext(ctx, log)
 
 	startStorage := time.Now()
 	res, async, err := d.Storage.Delete(ctx, name, deleteValidation, options)
 	if err != nil {
 		log.Error(err, "unable to delete object in storage")
 		d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
 		return res, async, err
 	}
-	d.recordStorageDuration(false, mode3Str, name, method, startStorage)
+	d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (d *DualWriterMode3) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (runtime.Object, bool, error) {
var method = "delete"
log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method)
ctx = klog.NewContext(ctx, d.Log)
startStorage := time.Now()
res, async, err := d.Storage.Delete(ctx, name, deleteValidation, options)
if err != nil {
log.Error(err, "unable to delete object in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}
d.recordStorageDuration(false, mode3Str, name, method, startStorage)
func (d *DualWriterMode3) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (runtime.Object, bool, error) {
var method = "delete"
log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method)
ctx = klog.NewContext(ctx, log)
startStorage := time.Now()
res, async, err := d.Storage.Delete(ctx, name, deleteValidation, options)
if err != nil {
log.Error(err, "unable to delete object in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
}
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 94 - 106, The Delete
method uses klog.NewContext(ctx, d.Log) instead of the enriched local logger and
passes name where kind should be recorded; change the context call to
klog.NewContext(ctx, log) so the enriched logger (log) is used, and update the
metric call d.recordStorageDuration(false, mode3Str, /*kind*/ options.Kind,
method, startStorage) to pass options.Kind instead of name; these fixes are in
the Delete function and involve the symbols klog.NewContext, d.Log, log, and
d.recordStorageDuration.

Comment on lines +125 to +132
startStorage := time.Now()
res, async, err := d.Storage.Update(ctx, name, objInfo, createValidation, updateValidation, forceAllowCreate, options)
if err != nil {
log.WithValues("object", updated).Error(err, "could not update or create object")
return nil, false, err
}
objInfo = &updateWrapper{
upstream: objInfo,
updated: updated,
log.Error(err, "unable to update in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
}
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same metric recorder bug in Update — recordLegacyDuration called for a storage error.

Line 129 records legacy duration when the storage update fails. Should be recordStorageDuration, consistent with other methods (e.g., Delete at Line 103, DeleteCollection at Line 156).

🐛 Proposed fix
 	if err != nil {
 		log.Error(err, "unable to update in storage")
-		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+		d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
 		return res, async, err
 	}
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 125 - 132, The storage
update error path in Update currently calls d.recordLegacyDuration(true,
mode3Str, options.Kind, method, startStorage); replace that call with
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) so
the storage duration metric is recorded on storage errors (match the success
call to recordStorageDuration and other methods like Delete/DeleteCollection).

Comment on lines +161 to +167
go func() {
startLegacy := time.Now()
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
defer cancel()
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

DeleteCollection goroutine records storage duration instead of legacy duration.

Line 166 calls d.recordStorageDuration(...) for the legacy DeleteCollection operation inside the goroutine. This should be d.recordLegacyDuration(...) to match the pattern in Create (Line 56), Delete (Line 113), and Update (Line 140).

🐛 Proposed fix
 	go func() {
 		startLegacy := time.Now()
 		ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
 		defer cancel()
 		_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
-		d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
+		d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
 	}()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go func() {
startLegacy := time.Now()
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
defer cancel()
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()
go func() {
startLegacy := time.Now()
ctx, cancel := context.WithTimeoutCause(ctx, time.Second*10, errors.New("legacy deletecollection timeout"))
defer cancel()
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
}()
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 161 - 167, The goroutine
handling the legacy DeleteCollection mistakenly calls d.recordStorageDuration
instead of d.recordLegacyDuration; update the call in the anonymous goroutine
that invokes d.Legacy.DeleteCollection(...) to call d.recordLegacyDuration(err
!= nil, mode3Str, options.Kind, method, startLegacy) so the legacy path's timing
is recorded consistently with Create/Delete/Update.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant